o 

(N 



O 



S 



Characterizing Optimal Sampling of Binary Contingency Tables 

via the Configuration Model 
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O 
O I Abstract 

A binary contingency table is an m x n array of binary entries with row sums r = (r\, . . . , r m ) 

and column sums c = (ci, . . . , c„). The configuration model generates a contingency table by 

considering r^ tokens of type 1 for each row i and Cj tokens of type 2 for each column j, and 

Ph then taking a uniformly random pairing between type-1 and type-2 tokens. We give a necessary 

Ph ' and sufficient condition so that the probability that the configuration model outputs a binary 

^ r*| contingency table remains bounded away from as N = Yli=i r i = S?=i c j g° es to oo. Our 

finding shows surprising differences from recent results for binary symmetric contingency tables. 
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t^j- ■ 1 Introduction 

(N 

Given two natural numbers m and n, let r = (n, T2, • • • , r m ) and c = (ci, C2, . . . , c n ) be vectors of 

r^| . positive integers such that Y^ILi r * = Y^j=i c j = N. Let f2 rc be the set of matrices with binary 
entries such that the sum of the i-th row is given by ri and the sum of the j'-th column is given by 
c,j. These matrices are known as binary contingency tables. We consider the problem of sampling 
uniformly from f2 r . c and of computing |O r]C |. 



A binary contingency table can be used to represent the adjacency matrix of a bipartite graph. 
Therefore, the problem of sampling uniformly from f2 r . c is equivalent to uniformly sampling a bipar- 
tite graph with m + n nodes such that the node degrees in one partition are given by n, r-2, ■ ■ ■ , r m 
and the node degrees in the other partition are given by ci, C2, • • • , c n (see Figure HJ^a) and (b) for an 
example). 

We study the following well-known and simple algorithm for sampling contingency tables, which is 
usually referred to as the configuration model and was introduced by Bollobas [8]. For each row i, 
consider n tokens of type 1, and for each column j, consider Cj tokens of type 2. Then, construct a 
table T by sampling uniformly a random matching between type-1 and type-2 tokens. In other words, 
first order the type-1 tokens in some arbitrary manner and draw a uniformly random permutation 
of the type-2 tokens. Then, establish a matching between type-1 and type-2 tokens according to the 
position in the permutation. In this way, the entry Tj j is taken to be the number of type-1 tokens 
from row i that were matched to type-2 tokens from column j (see Figure [T^c) for an example). 
In the context of sampling bipartite graphs, the tokens are usually referred to as half- edges and 
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Figure 1: (a) A binary contingency table for the sequences r = {3,2,1,1} and c = {2,2,1,1,1}. 
(b) The bipartite graph corresponding to the table of part (a), where the leftmost partition represents 
the rows of the table and the rightmost partition represents the columns of the table, (c) A possible 
pairing between type-1 and type-2 tokens of the configuration model that corresponds to the table 
of part (a). The labels in the type-1 and type-2 tokens represent the index of the row and column, 
respectively, corresponding to that token. 



the matching establishes a pairing between half-edges of one partition and half-edges of the other 
partition. 

The configuration model produces a table in 0(A) time, but may output a non-binary table, which 
would correspond to a bipartite graph with more than one edge between the same pair of vertices. 
Yet, given that the table generated is binary, the output table is a uniform sample from 0, rc . In order 
to see this, note that the table does not change if we take the permutation and switch the position 
of two type-2 tokens corresponding to the same column or if we switch the position of two type-2 
tokens that are matched to type-1 tokens from the same row. Thus, there are exactly YYlLi YYj=i r i- c y 
different permutations of type-2 tokens for any given binary contingency table. Since this number 
does not depend on the table, but only on the sequences r and c, we obtain that any given binary 
contingency table has the same probability to be generated by the configuration model. Therefore, if 
the probability that the configuration model outputs a binary table does not go to zero as A — > oo, 
we obtain both an exact sampler for the uniform distribution on fl r ,c and, as explained in Section O 
a randomized algorithm to approximate \£l r ,c\ that runs in time G(A). We call such a running time 
optimal for uniform generation on f2 riC since it takes at least A elements to encode a given binary 
table. 

We study the asymptotic behavior as A — > oo of the probability that the configuration model 
generates a binary table. For this reason, we consider input sequences (r(A), c(A))n>i, where for 
each A > 1, r(A) and c(A) are vectors of cardinality m(A) and n(A), respectively, and whose 



elements are non-negative integers and satisfy ^ 



m(N) 



■/'; 



(A) 



Ejg^W 



A. We assume that 
> c n t]\f\(N) for all A and allow m(N) 



n(A) > r 2 (A) > ••• > r m{N) (N) and Cl (A) > c 2 (A) > 
and n(N) to go to oo with A. It will be convenient in our development to consider the vectors r(A) 
and c(A), for fixed A, as having infinite elements. To this end, we set rj(A) = for i > m(N) + 1 
and Cj(N) = for j > n(N) + 1. Without loss of generality we assume that r\(N) > ci(A) for all 
A. Furthermore, for each i and j, we regard (rj)jv>i and (cj)jv>i as sequences in their own right. 
Finally, when taking subsequences of (r(A),c(A))jv>i, it is useful to see the input sequence as a 



sequence of tuples (r(£), c(£), N(£))g>i with N(£) = £ for all £. With this, we have that 

a subsequence of (r(£), c(£), N(£))e>i is a sequence of tuples (r'(£'), c'(£'), N'(£'))^>i 
for which there exist positive integers ki > fo > ■ ■ ■ so that r {£ ) = r(ki'), 

oo oo 

c'{£') = c(k ff ) and £ rj(0 = £ <£(0 = #'(0 = N{k ff ). (1) 

i=i i=i 

For brevity, we shall drop any explicit dependence on £ and N from our notation; e.g., we refer to 
the input sequence as (r, c)jv>i and use the full notation (r(£),c(£), N(£))i only when talking about 
subsequences or where ambiguity may arise. 

Our main result characterizes the class of input sequences (r, c)jy for which the configuration model 
takes @(N) time to sample uniformly from f2 rc and to approximate \£l r ,c\ as iV — > oo. Note that 
if ci = 1, all the tables satisfying r and c are binary, so not only the configuration model generates 
only binary contingency tables, but also |fl r ,c| can be trivially obtained. Thus, we assume that 
n > c\ > 2 for all N. 

For any sequence of tuples S = (r(£),c(£),N(£))£>i we define k(S) to be the first row having sum 
o(N); more formally, 

k(S) = mm{i > 1 : n{£) = o(N(£)) as £ -> oo}. (2) 

We now state our main result, which gives necessary and sufficient conditions for the optimality of 
the configuration model. 

Theorem 1.1. Let T be a table produced by the configuration model given the input sequence S = 
(r, c)jv- We have that P (T £ fi rc ) = ^(1) as N — > oo if and only if the following two conditions 
hold: 

1- EZi E U n{n - l) Cj ( Cj - 1) = 0(N*). 

2. For every subsequence S' = (r'(£'),c'(£'),N' (£'))# of S we have that either 

limsup } ' > or limsupc^/) < k(S'). 

i'^od . , Ql s N {£ > f->oo 



Remark 1.2. We point out that k(S) may be oo (e.g., consider n = I £ J for all i £ {1, 2, . . . , log 2 N} 
and r j = 1 for as many values of i > log 2 ./V as needed to make the sum of the r j 's equal to N) . In 
this case, we define E£ K (S) T^ = ®- 

Remark 1.3. When r\ = o(N), condition [2] above is always satisfied since k(S') = 1 and 
£~ K(5 ,) r[{£') = N'(£') for every subsequence S' = (r'(£'),c'(£'),N'(£')) e , of S. Thus, condition ffl is 
both necessary and sufficient. 

Remark 1.4. Note that, for the practically relevant case where (rj)jv and (cj)n are non-decreasing 
with N for all i and j, condition [2] can be replaced by the simpler condition 

oo 

y^ n = Q(N) or lim c\ < k{S), 

i=K(S) 



which concerns only the sequence (r, c)n and not every subsequence of (r, c^ 



N- 



Remark 1.5. Note that conditions Q] and [2] are not redundant. For instance, for any sequence 
S = (r, c)jv with t\ = N — o(N), c\ = 2 and Cj = 1 for all j = 2, 3, . . . , N — 1, we have k(S) = 2, 
which violates condition [21 though condition Q] holds. 

Our theoretical developments are partly driven by our desire to guide practitioners in areas of applied 
statistics who often deal with hypothesis testing involving graphical models and binary contingency 
tables (see for instance [3] and [Hi]). In these types of settings, data is encoded in the form of 
a binary table and one is interested in studying the null hypothesis that row and column sums 
are sufficient statistics for determining the distribution of all the entries in the table. To test this 
hypothesis statisticians compare the value of a given statistic of the observed table (e.g., the sum 
of the hamming distances of pairs of rows) with values generated by sampling tables under the 
distribution induced by the null hypothesis, which is precisely the uniform distribution on binary 
contingency tables with prescribed row and column sums. For its simplicity and small running time, 
the configuration model is a very appealing algorithm to be used in this setting. Our Theorem ll.il 
above fully characterizes the sequences r and c for which the configuration model is a fast and reliable 
algorithm for uniform generation of binary contingency tables. 

The configuration model is by now a classical, well-known algorithm that has been applied in practice, 
as described above, and also in more theoretical settings. For example, some asymptotic estimates for 
|^r,c| (e.g., [17] and [13]) are obtained via analyses of the configuration model. Some results on the 
structural properties of graphs obtained uniformly at random from £l rc also use the configuration 
model (e.g., [151 Chapter 9] and [12]). Usually, it is easier to analyze a graph obtained via the 
configuration model than a random sample from f2 riC , and it is important to know whether results 
for one model can be carried over to the other. In order to explain how our results apply to this 
type of questions, let A be any property that can be tested for a bipartite graph (e.g., A can be the 
property that the graph has a connected component with a constant fraction of the vertices, which 
is the property studied in [12]). If the conditions in Theorem 11.11 hold, then any property A that 
holds with probability 1 — o(l) for the configuration model also holds with probability 1 — o(l) for a 
graph obtained uniformly at random from £l r ,c- This corresponds to the notion of contiguity between 
probability measures, which is more thoroughly explained in [151 Chapter 9]. The corollary below 
gives an application of our results. We remark that this can only be obtained since the configuration 
model is an exact sampler for the uniform distribution over £l r ,c- 

Corollary 1.6. Let A be a property that can be tested for a bipartite graph. Letp(A) be the probability 
that a graph obtained uniformly at random from £l rc contains property A, andp'(A) be the probability 
that a graph obtained via the configuration model given r and c contains property A. If conditions^ 
and\^ in Theorem \l.l\ are satisfied and p'(A) = 1 — o(l), then p{A) = 1 — o(l). 

Proof. Let p be the probability that the configuration model outputs a binary table. Note that 
p = fi(l) if conditions Q] and [2] in Theorem 11.11 are satisfied. Since the configuration model is an 
exact sampler for the uniform distribution over O rjC , we obtain p(A) > 1 ' ( - = 1 — o(l). □ 

2 Related Work 

Theorem 1 1 . 1 1 can be seen as an extension of recent work by Janson [J2], who studied the probability 
that the configuration model generates a binary symmetric table. Letting Q' r be the set of all 
binary symmetric tables with row and column sums given by r, |14[ Theorem 1.1] establishes that 
P(Te Q' r ) = 0(1) if and only if XXi r i = °( N )- 



To contrast Janson's result to the case of non-symmetric tables studied here, note that (r, c)jy 
satisfying conditions [1] and [2] give rise to a much wider class of behavior than in the symmetric 
case. For instance, the apparently similar conditions Y^hLi S?=i r i{ r i ~ l) c j( c j ~ 1) = 0{N 2 ) and 
Y^JiLi YUj=i r ? c ? = 0(N 2 ) are far from identical; if c = {2, 1, 1, ... , 1}, then the former condition is 
satisfied regardless of r while the latter may not hold. Besides, the condition ^JILi r f = 0{N) for 
symmetric tables allows r\ to grow only as 0(\/N), whereas our Theorem 11.11 reveals that there are 
sequences with n as large as N — o(N) for which the configuration model produces a binary table 
with probability $7(1). Therefore, the growth behavior allowed for r\ in Theorem 11.11 as N — > oo 
is much wider than in the symmetric case. This wider type of growth behavior makes the analysis 
for the non-symmetric case qualitatively different. Moreover, our proof techniques are completely 
different from those employed by Janson and reveal some structural properties of the tables generated 
with the configuration model. For example, we show that conditioning on the entries with relatively 
large row and column sums being binary, the probability that there is an entry with value larger 
than 2 is tiny (see Lemma |6.5[) . We believe that our techniques can be exploited in the analysis of 
related problems (such as efficient sampling of non-binary contingency tables). 

Polynomial-time algorithms have been developed for the problem of approximating |O rjC |. In fact, 
approximating |Q rjC | can be reduced to the problem of computing the permanent of a binary £ x 
£ matrix with £ = 0(mn); a problem that enjoys a notable history and place in the theory of 
computation. Valiant [22] showed that computing the permanent belongs to the class of #P-complete 
problems, for which proving the existence of a polynomial-time algorithm would have extensive 
implications in complexity theory. It is still an open problem, however, to verify whether counting the 
number of binary contingency tables is #P-complete, though the more general problem of counting 
the number of (not necessarily binary) contingency tables has been shown to be ^P-complete by 
Dyer et al. [11]. The ground-breaking work of Jerrum et al. [16] provided the first Fully Polynomial 
Randomized Approximation Scheme (FPRAS) |19] to compute the permanent of a binary matrix. 
Bezakova et al. [6] used simulated annealing techniques to develop an asymptotically faster algorithm 
to approximate the permanent, which runs in 0(P log £) time for an £ x £ matrix. In another paper, 
Bezakova et al. [5] developed an algorithm that works directly with contingency tables. Their 
algorithm for approximately sampling binary tables runs in 0(m 2 n 2 iV 3 A log (m + n)) time, where 
A is the maximum over all row and column sums. 

Although these algorithms are proved to run in polynomial time for all r and c, their efficiency is far 
from being useful in the types of applications described at the end of Section [TJ For this reason, other 
approaches to uniformly sampling and counting binary contingency tables have been proposed. Chen 
et al. [10] developed a sequential importance sampling algorithm to count the number of contingency 
tables. Their algorithm applies a heuristic construction and has been observed to perform well in 
practice, but Bezakova et al. [5] proved that there exist r and c such that the heuristic of Chen et 
al. [10] underestimates the number of binary contingency tables by an exponential factor unless the 
algorithm is run for an exponential amount of time. On the other hand, Blanchet [7J provided a 
rigorous analysis of the heuristic of Chen et al. [10] and showed that if r\ = o(yN), Yl^Li r i = O(N), 
and c\ = 0(1), then this approach yields a FPRAS for counting binary contingency tables with 
running time 0(iV 3 ). Our Theorem 11.11 significantly weakens the assumptions in [7J, and drastically 
improves upon the running time of all the aforementioned algorithms. 

In a different direction, much effort has been made to derive asymptotics for |fi riC |- The first result 
to allow the row and column sums to grow with N is the one by O'Neil [20], which is restricted to the 
case n = m and 7*1 = 0(log ' ~ e n) for any constant e > 0. Later, McKay [T7] considered the case 
r\ = o{N 1 / i ) and derived the first asymptotics for |O rjC | to allow r\ to grow polynomially with N. 
Currently, the asymptotics for sparse binary tables that allows the largest range for r and c is the 
one by Greenhill et al. [13] for the case r\C\ = o(iV 2 / 3 ). These results by McKay [T7] and Greenhill 



et al. [13] were obtained using the configuration model as a part of their proof technique. Similarly, 
the work of Blanchet discussed above [7j also uses the configuration model, as well as McKay's 
estimator [T7], to analyze the heuristics of Chen et al. [10]. Using different techniques, Canfield et 
al. [9] derived asymptotics for dense binary tables, and Barvinok [1] derived general lower and upper 
bounds for |fi r ,c| that are within a factor (mn) ®( m + n ) from each other. For binary symmetric tables, 
besides the work of Janson [T3] cited above, we highlight the work of Bayati et al. [2], who developed 
an algorithm that generates a symmetric table almost uniformly at random in time 0(r\N) as long 
as ri = c\ = 0(N 1 ' 4l ~ e ) for any constant e > 0. Their analysis gives an alternative proof of a result 
originally derived by McKay [18j . 



We remark that, to the best of our knowledge, none of the existing asymptotics for |f2 r .c| applies 
to the whole of the spectrum of sequences r and c that satisfy conditions Q] and [2] in our Theo- 
rem 11.11 Furthermore, most of the known asymptotics take advantage of the configuration model 
in a fundamental way. Since our result fully characterizes the sequences for which the configuration 
model is contiguous to the uniform distribution, our conditions shed light into the whole spectrum of 
sequences for which analytical estimators might be obtained by directly applying the configuration 
model. 

Under the conditions of Theorem |1.1[ the configuration model gives a FPRAS for approximating 
|fi r ,c|j thus it approximates |fi r ,c| to a precision of the form 1 + 0(N~ C ), for an arbitrarily large 
constant c > CLj, whereas asymptotics for |O rjC | have fixed precision. We remark that asymptotics 
for |fir C | can also be used to produce an almost uniform sampling procedure for binary contin- 
gency tables. Sinclair and Jerrum |21] showed that for any self-reducible problerdj an asymptotic 
approximation with at least constant precision can be used to produce an almost uniform sampling 
procedure. However, the running time of the sampling procedure depends on the mixing time of a 
Markov chain, which not only may be challenging to obtain precisely but also is usually too large for 
many practical applications. Moreover, we remark that this technique cannot be directly employed 
with the current asymptotics for |fi riC | since they impose some conditions on r and c. Under these 
conditions, the problem of sampling binary contingency tables is not guaranteed to be self-reducible: 
when splitting the table into smaller tables, we do not necessarily obtain that the new row and 
column sums satisfy the conditions of the asymptotic results. 



3 Preliminaries 

As mentioned in Section [H we use the configuration model to generate a contingency table T (not 
necessarily binary). There are AH possible matchings among the tokens, but any given binary contin- 
gency table generated by the configuration model corresponds to ni=i TYj=i r J- c y such matchings, 
since permuting the tokens within each row or column does not change the final table. Therefore, 
we can conclude that \£l r ,c\ Y\aL\ XYj=i r J- c y = P (T € ^ FiC ) AM, and the problem of computing |O r>c | 
is equivalent to evaluating P (T G (l r ,c)- 

If P (T G f^ r ,c) = ^(1)) then we obtain a Fully Polynomial Randomized Approximation Scheme 
(FPRAS) for estimating \£l rjC \ as follows (we refer the reader to [19j for more information on FPRAS). 



We remark that under the conditions of Theorem 11.11 the configuration model approximates |fi r ,c| to a precision 
of the form 1 ± e for any constant e > in time O(N), but can approximate \£l r ,c\ to a precision 1 + 0(N~ C ) for an 
arbitrary constant c > in polynomial time. 

informally, a problem is self-reducible if it can be split in parts where each part is itself a smaller instance of the 
same problem. In the case of sampling binary contingency tables, after generating all the entries of a given column, 
we can update the row and column sums properly so that generating the remaining entries translates to sampling a 
binary contingency table with different row and column sums. 



Generate a sequence of independent contingency tables using the configuration model and output 
the fraction of the tables that turn out to be binary. If P (T € f2 r ,c) = ^(1), for any constants 5 > 
and e > 0, it suffices to generate a constant (depending polynomially on e" 1 and log5~ l ) number of 
tables such that with probability 1 — 5 our estimator to \£l r ,c\ has precision 1 ± e. 

We conclude this section by introducing fundamental notation that we will use in the proof. Let / be 
the index set [1, m] x [1, n] and T = i^i,j) u,j)el be a table generated by the configuration model. Let 
Z be the number of non-binary entries of T, so P (T £ O r ,c) = P (Z = 0). Given two integers k > 
and x > 0, we define x- = /J^ k \\ ■ Recall that the configuration model generates a table by taking 
a random matching between type-1 and type-2 tokens. We assume that each token is individually 
labeled and refer to a single pair of a type-1 and a type-2 token as an edge. We say that an edge 
is matched by the configuration model if the corresponding tokens are matched. A set of two edges 
for the same entry is referred to as a double edge. For {i, j) G /, let B^{i, j) be the set of all possible 
double edges that can be matched for the entry (i, j). An element of 62 (i, j) has the form {e±,e<2}, 
where e\ and e2 are disjoint edges for the entry (i,j), that is, e\ and e2 correspond to 4 distinct 
tokens, 2 type-1 tokens from row i and 2 type-2 tokens from column j. Clearly, the cardinality of 

9 9 

F>2{i,j) is given by \B2(i,j)\ = r-cj/2\. For any (i, j) G I and B G 82(1, j), let M(B) be the event 
that the double edge represented by B is matched by the configuration model. Note that given 
any specific B G £> 2 (i,j), P(A^(-B)) = 1/iV-. With this notation, note that the event {Z > 1} is 
equivalent to {Useft -^(^)}- 

4 Proof of Theorem 11.11 

The proof of Theorem 11.11 follows from the three propositions below, which we will prove in subsequent 
sections. The first proposition, Proposition l4.ll shows that condition Q] is necessary; its proof is given 
in Section [5l 

Proposition 4.1. As N — > 00, ifYlu j)£i r i( r i ~ ^) c j( c j ~ 1) * s n °t 0{N 2 ), then P (T G Q r c ) is not 
0(1). 

The proof of Proposition 14.11 highlights the importance of the definition of double edges, since con- 
dition [1] in Theorem 11.11 translates to the expected number of double edges in T being uniformly 
bounded over N. Note that for the case of symmetric tables, condition [1] is both necessary and 
sufficient, while for the non-symmetric case it is just necessary. Now, we assume that r\ = o{N) and 
show in Proposition 14.21 that, in this case, condition [1] in Theorem 1 1.1 1 is also sufficient. The proof of 
Proposition 14.21 is presented in Section [H 

Proposition 4.2. As N ^ 00, if ^2u j)£i r i( r i ~ ^) c j( c j — 1) = 0(N 2 ) and r\ = o(N) then 

p(ren r , c ) = n(i). 

If r\ is not o(N), i.e., lim sup^r^,^ n/N > 0, then (r,c)Ar contains a subsequence 
(r'(^),cf(i!),N'(^))t> 1 for which r[(£') = 0(7V'(f)) as £' -+ 00. (Recall the definition of a sub- 
sequence of an input sequence in (fjQ).) The next proposition deals with the case r\ = O(TV) and its 
proof is presented in Section [71 

Proposition 4.3. As N ->• 00, if Yl(i,j)£i r i( r i ~ l ) c j( c j ~ l ) = 0(N 2 ) and r x = Vt(N), then 
P (T G Q r ,c) = ^(1) if and only if, for every subsequence S' = (r'(£'),c'(£'),N'(£'))£/>i of S = (t,c)n, 
we have limsupp^^^^/yj WW) ^ ^ or nmsu P^'->oo c i(^') < K {S'), where k is defined as in ([2]). 



It is clear that Propositions 14. ll 14.21 and 14.31 establish that P(T6 O rjC ) = 0(1) if an d only if both 
condition Q] and [2] in Theorem 11.11 are satisfied and r± is either o(N) or O(iV). We now explain the 
case when r\ is neither o(N) nor O(iV), i.e., lim supjv-s-oo n/N > lim inf n^-oo r±/N = 0. For this, we 
will make use of the following technical lemma, which is also used in |15| chapter 9] and [T< 



Lemma 4.4 (Subsubsequence principle). If every subsequence (r'(£'),c'(£'),N'(£'))i> of (r, c)jy con- 
tains a further subsequence (r"(£"),c"(£"),N"(£"))e> for which P(Te O r » )C ») = 0(1) as £" -> oo, 
i/ien P (T G O r , c ) = 0(1) as iV ^ oo. 

Proof. We will prove this lemma by contradiction. Assume that every subsequence 

(r'(£'),c'(£'),N'(£')) e , of (r,c)jv contains a further subsequence (r"(^'),c"(Oj-W"(£"))/" for which 
P (T G O r » iC ") = 0(1) as £" — )• oo, but P (T G rjC ) is not 0(1) as iV -y oo. This means that 
liminfTv-i-oo P (T G rjC ) = and, consequently, there exists a subsequence (r(£),c(£),N(£))g of 
(r, c)at such that lim/ P (T G Of^) = 0, which contradicts the assumption. □ 

Finally, we will be able to conclude the proof of Theorem 1 1 . 1 1 with Lemma 14.51 below, which uses the 
subsubsequence principle. Since we will also apply this lemma later on, we give it in more generality 
than needed here. To avoid ambiguity, we use the full notation discussed in ([1]) and in the paragraph 
immediately preceding it. 

Lemma 4.5. Let Z be some space of sequences (r(£),c(£),N(£))i indexed by £ > 1 such that if 
(r(£),c(£),N(£))e G Z then any subsequence of (r(£),c(£),N(£))£ is also in Z. Given a sequence 
S = (r(£),c(£),N(£))i G Z, let fe(S) be a sequence of nonnegative real numbers indexed by £ > 1. 
Assume that for all S = (r(£),c(£),N(£))i G Z such that f e (S) = 0(1) or f £ (S) = o(l) as £ -> oo, 
we have P (T G r)C ) = 0(1) as N — )■ oo. Then, P (T G O r)C ) = 0(1) also holds for each S = 
(r(£), c(£), N(£)) e eZ for which limsup^ f e (S) > liming f t {S) = 0. 

Proof. We use the subsubsequence principle (Lemma l4.4j) . If lim sup^^ fe(S) > liminf^_ i . 00 fi(S) = 
0, then for every subsequence S' = (r'(£'),c'(£'),N' (£'))£/ of S it is the case that S' G Z (by the 
property of Z) and either 

limsup/£/(5') = (3) 



or 

Jl l nll\ _// 1 nll\ at'I l nl l \ 



there is a subsequence S" = (r"(£"),c"(£"), N"(£")) e » G Z of S' for which liminf f e »{S") > 0. (4) 

I"— 'too 

In the case of (J3|), since ft'(S') = o(l) as £' — > oo, we know that P (T G O r / c /) = 0(1). In the 
case of (HJ), since /at(S"') = 0(1), we have P (T G r » c //) = 0(1) as £" — > oo. Therefore, using the 
subsubsequence principle we obtain P (T G rjC ) = 0(1) as iV — > oo. D 

We set Z as the space of sequences satisfying conditions [T] and [2] from Theorem 11.11 It is easy to 
check that this space satisfies the condition in Lemma 14.51 Then, for S = (r(£),c(£),N(£))g G Z we 
set ft{S) = TfjjjT, and Lemma 1431 gives that, as iV — > oo, P (T G O r>c ) = 0(1) also for the case when 
r\ is neither O(iV) nor o(N) but conditions [1] and [2] hold. This completes the proof of Theorem ll.il 



5 Proof of Proposition 14.1 



We prove Proposition 14.11 using the second-moment method. We define the function 

"W = E ^ (5) 

which satisfies lim sup^^^ /jl(N) = oo by the assumptions of Proposition 14. 1\ and show that under 
this condition lim inf at-kjo P (T G O r ,c) = (i.e., P (T E fi r ,c) is not 0(1)). Let F be the random 
variable counting the number of double edges that are matched by the configuration model, that is, 
F = ^2 B( zq 1 (M(B)), where 1 (•) is the indicator function. Note that 

EF =E E j? = El=«. (o) 

(i,j)el BeB 2 (ij) (i,j)ei 

Our strategy is to use Chebyshev's inequality to obtain an upper bound for P(T£ O r ,c) via 

P (T e n P)C ) = P (F < 0) = P (EF - F > EF) < -^y 1 - (?) 



We now derive an upper bound for Var(F). Note that 

Ei?2 = E E E E P (M(B) n M(B')) , 

(i,j)ei B€B 2 (i,j) (i' ,j')ei B'eB 2 (i' ,f) 

from which we can write 

Var(F) = E E P ( M ( B )) E E (P (M(B') \ M(B)) - P (M(B'))) (8) 

(i,j)el BeB 2 (i,j) (i',j')£lB'€B 2 (i',j') 

2 2 

~ E ^|( 1 + ( ^l( i '.?')+^2(^j)+V3(i,i)+V4(i,j)), 

(*,i)e/ 

where the terms 1 and <-p\(i,j) to <p±{i,j) are explained next. First of all, when (i,j) = (i',f) and 
B = B' we have that (P (A^(JB') | M{B)) - P (.MOB'))) < 1. Now, the term <pi(i,j) corresponds 
to the cases where B and B' are double edges for the same entry (i,j) and also have one edge 
in common (i.e., B U B' is a set of three edges). In such cases, to compute (fi(i,j) we shall use 
P(M(B') | M(B)) - P(M(B')) < P(M(B') | M{B)) and simply estimate P (M(B') \ M{B)). 
The term (f2(i,j) corresponds to the terms where B and B' are double edges for the same entry 
(i,j) but have no edge in common (i.e., B U B' is a set of four edges). Before proceeding to describe 
f3(i,j) and (pi(i,j), let us explain how to compute tpi(i,j) and <f2{i,j), which we express as 



and 



^^= 2(^-2)1 -im^- (10) 

Given that a double edge {e\ , e^\ is chosen from the entry (i, j), <p\ (i, j) is the probability that another 
edge es from (i,j) is chosen, which is given by W—2 — • The additional factor 2 comes from the 
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fact that, once we fix e$, there are 2 possible choices of double edges for B' , namely B' = {e\,e^} 
and B' = {e2,e3J. The equation for <f2(i,j) is obtained in a similar way, but we need to compute 
the probability that we choose a double edge { 63,64} from (i,j) such that {ei,e2J n {e^,e{\ = 0, 
which gives the term — oiN-i)' 1 ^ ne ^ as ^ term in (|10p comes from the term P (M(B')) from (j5]l, 
which is the probability that a double edge {€3,64} is chosen independently of {ei,e2J. 

The term <p3(i,j) corresponds to the terms where B and B' are double edges for the same row but 
different columns or for the same column but different rows. Using similar reasoning we obtain 

</*<?,]) - 2. y 2{N _ 2) 2 - ^1 J + L, ^ 2 (iv - 2)1 " ^vij - 0> (11) 

Ultimately, ip±(i,j) corresponds to terms where B and B 1 represent disjoint rows and columns and 
is given by 

*•«■»- £i:(5ffti-#)- < i2 > 

Now we simplify the equations above. For <f2(i,j) and ^3(1, j) we simply use the fact that they are 
at most 0. For <pi(i,j), we have that (J5J) implies 



,y/ri(ri- 1)cj(cj - 1) 



¥>i(i, j) < 2 V tKt N _ 3 2 K3 < 2VMN)(1 + 0(1/N)), (13) 

for rj,Cj > 2. Clearly, the entries (i,j) with r* < 1 or Cj < 1 do not contribute to F. Since 
1/(N - 2)2. - 1/N& = 0{1/N 3 ) we can write 

2 2 

Mi J) < EE ^-0(1/N*) = 0(n(N)/N). 



Therefore, the variance of F translates to 



2 2 



Var(F) < Y, ^k( 1 + 2 VMN)+0(KN)/N) 
(i,j)ei 

= n(N)0(y/jI(N) + n(N)/N). (14) 



Since ri,Cj < N for all (i,j) G I, we have /j,(N) = 0(N 2 ), and consequently, y/^{N) = £l(/j,(N) / N) . 
Now, plugging (jSJ) and (|14[) into (|7|). we obtain that there are constants C and iVo such that for all 
N > Nq we obtain 

P(T e n r , c ) = P(F<0)<^4. 

Taking the liminf as N — > 00 concludes the proof of Proposition 14.11 since lim sup^y^^ fi*(N) = 00. 

6 Proof of Proposition 14.21 

In the proof of Proposition 14.21 we assume that 2 < c\ < r\ = o(N) and that there exists a constant 
C > such that for all sufficiently large N it holds that 

E S * c - < 15 > 

(hj)ei 
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V 



(a) 



(b) 



Figure 2: Diagram illustrating the entries in 7l (shaded area in part (a)) and the definition of /£, 
7£, Si, and tj. 

We split the table T into two regions 2l and Is- Let e > be a small number that we will set later, 
and define the index sets Jl = {(hj) £ I- fa — l)( c j — 1) > e-^} and lg = I \ Jl- We remark that 
the set Jl may be empty. Intuitively, Jl represents the entries of T with large row and column sums. 
Since the r^'s and the Cj's are assumed to be non-increasing, a useful conceptual diagram for the 
definition of Jl is given by the shaded area in Figure EJa). 



Let Zl be the number of non-binary entries in 1^, and Z$ be the number of non-binary entries in 
Is- Clearly, Z = Z^ + Z$. Let Wl be the sum of the entries in Jl (i-e., Wl = Ylu j)ei lij). Note 
that {W L = 0} C {Z L = 0}, which gives P (T G O r , c ) > P (W L = 0) P (Z s = | W L = 0).' We will 
deal with the terms P (Wl = 0) and P (Z$ = | Wl = 0) separately. We will need the following 
definitions. Let I£ be the set of rows with at least one entry in Jl (i.e., J£ = {2: (i,j) £ Jl for some 
j}), and J£ be the set of columns with at least one entry in Jl (see Figure Wi a))- Lemma 16. II below 
deals with the term P (Wl = 0). 

Lemma 6.1. If Q5)J is satisfied and r± = o(N), then we obtain |Jl| = 0(1) and P (Wl = 0) = 0(1). 



Proof. We assume that Jl is not empty (otherwise the lemma vacuously holds) and that (I15p is 
satisfied. Note the importance of the assumption c\ < r\ = o(N); if for instance Ti = £1(N) for 
all i, then all the entries from column 1 could be contained in Jl and, therefore, we would have 
P (Wl = 0) = 0, violating the statement of the lemma. 

Let 7 = Y^u j)ei r iCj/N. We show that 7, |Jl| = 0(1). For all sufficiently large TV" we have 



^ E 



(*,j)ei-L 



2 2 
27V 2 - 



> 



E 



2iV 



11 
2 ' 



(16) 



where the second inequality is obtained from (n — 1)(cj — 1) > eN, for all (i,j) € Jl by the definition 
of Jl- On the other hand, from the definition of Jl we obtain 
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(».i)e/ L 



> 



E 

(JJ)e/ L 



( r ._l)( Cj _l) 



> |/L|6. 



Consequently, combining the previous estimate with (|16p we conclude that 7, | Jl| = O(l) 

It is useful to see Figure E|a,b) throughout this discussion. Let Sj = Y2j>-, iij')£i h c j' 

Sj'- fj' 7 )g/ r v ■ m words, for any row i E J£, Sj is the sum of the column sums over all entries in Jl 



and tj 
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corresponding to row i. Similarly, tj is defined for any column j £ J£ as the sum of the row sums 
over all entries in 7l corresponding to column j. Note that jN = X^er r * s * = S 7 e/ C c fiy 

We now derive a lower bound for P (Wi, = 0). Consider the row 1, which belongs to /£. The number 
of ways we can match the r\ type-1 tokens of the first row with the N — s\ type-2 tokens available 
outside /l is (N — si)— . Employing this reasoning for each row in I£, we obtain (let £ = |/£|) 



P (W L = 0) 



> 



(N - Sl ) r -± (N - n - s 2 ) r -± (N- n rt-i - s e )'^ 

Ati (jv-np. '" (N-n r £ _i)^ 

/ \ ri 




S(: 






N - o(N) 



(17) 



" eXp l-><[N-o(NW-0{s i /N)] ] (18) 

> exp(-7) -o(l), 



where in (|17p we used the fact that Yli'ei 1 r *' — r il-^Ll = °(-^0j an d (US]) comes from the fact that 
(1 — x) > exp(— x/(l — x)) for all x £ [0,1], which we apply with x = Si/(N — o(N)) for each 
i £ /£. Moreover, since r\ > ci and (J15D holds, then c,- = 0(\N) for all j, and consequently, 
s « < c i|^£l = 0(yN). This completes the proof of the lemma, since 7 = 0(1). □ 

For the term P (Z$ = | Wl = 0), we use the fact that /l contains only 0(1) entries to conclude that 
conditioning on II has a small effect. Then, we can carry out the analysis as if no conditioning is 
being made, and we use the fact that (ji — 1)(cj — 1) < eN for (i,j) £ Is to simplify the calculations. 
The following lemma deals with this case. 

Lemma 6.2. // (E|) is satisfied, then P (Z s = | W L = 0) = fi(l). 

The proof of Lemma 16.21 is rather delicate and will require additional lemmas. We then devote the 
remainder of this section to prove Lemma 16.21 Note that the proof of Proposition 14.21 follows from 
Lemmas 16.11 and I 



Proof of Lemma 16.21 

We will use the following quantity 



X = X(N)= J2 
(i,j)ei s 



2 2 
rjcj 

2N 2 ' 



Clearly, lim sup^^^ X < C given (fl~5j) , and A = E [Zs] . 

Before proceeding to the proof, we need to introduce some notation. For an integer k > 1 and 
any (i,j) G /, let Bk(i,j) be the set of all sets of k disjoint edges between row i and column j, 
which generalizes the definition of B 2(1,3) from Section^ A typical element of Bk(i,j) has the form 
{ei, C2, • • • , e^}, where e\, e2, ■ ■ ■ , e& are disjoint edges between row i and column j. Recall that the 
edges e±,e2, ■ ■■ ,&k are said to be disjoint when each edge ee, 1 < £ < k, corresponds to a pair of 
tokens {/3^, fy} such that /3i 7^ $2 7^ ■ ■ ■ 7^ At an d P\ ¥" P'2 ¥" " ' ¥" P'ki that is, the edges e±, e<2, . . . , e^ 
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do not share tokens. Intuitively, Bk(i,j) is the set of all possible ways we can match k edges between 
row i and column j. So, clearly \Bk(i,j)\ = r~cj/k\, for all (i,j) € I and all k. 

For any set of edges B, we define M(B) as the event that all the edges in B are matched by the 
configuration model. Note that the occurrence of M.(B) for B S Bk(i,j) means that the entry Tjj > 
k. For all k, define B/. = |L j)elg &k(hj)- Finally, for all k > 1, define the event Vk = [Jbgb -M(B). 
In words, Vk is the event that there is an entry Tjj > k in 1$. Using our notation we can write 

P (Z s > 1 | W L = 0) = P (V 2 | W L = 0) . (19) 

When (|15p is satisfied, we can show that the probability that V3 occurs is small. We exploit this fact 
to simplify the calculations via the following trivial inequalities 

P (7? 2 j W L = 0) > P (V 2 n V% I W L = 0) (20) 

and 

P (7> 2 I W L = 0) < P (P 2 n VI I W L = 0) + P {V 3 I W L = 0) . (21) 

We start the proof by stating a lemma that indicates that for any two disjoint sets of disjoint edges 
B and B' , the probability that the edges in B are matched by the configuration model conditioning 
on B' being matched and Wl = is essentially the same as without conditioning. 

Lemma 6.3. Let B and B' be disjoint sets of disjoint edges from Is such that the event Ai(BL) B') 
has non-zero probability when conditioned on Wl = 0. Let k and k' be the number of edges in B and 
B' , respectively. Ifk,k' = 0(1), we obtain 

P(M(B) I ^(50,^ = 0) = ^^k- (22) 

Proof. Before proceeding to the proof we show how to sample a table T under the condition Wl = 
and Ai(B'). If we were to condition only on Ai(B'), it would suffice to disregard the tokens 
corresponding to the edges in B' and take a random pairing for the remaining tokens in a standard 
fashion. However, conditioning on Wl = is more delicate. For example, conditioning on Wl = 
implies that a pair of tokens from row i and column j for which (i,j) £ /l cannot be matched since 
all the entries in 7l are 0. 

Define Sj = Ylj'-. a j')g/ l c J' anc ^ ^ = St'- [V j)eih Ti ' ( re ^ er *° Figure [2(b) for a pictorial illustration 
of Si and tj). Let pi be the number of edges in B' corresponding to row i, and let //• be the number 
of edges in B' corresponding to column j. We sample T in a column-by-column manner, but the 
order according to which we sample the columns will matter. For each column j, there is a set Xj of 
type-1 tokens that can be matched to type-2 tokens from column j given Wl = and B'. Also, since 
the columns are in non-increasing order, we obtain that Xj C Xji for all j' E [j,n]. Our strategy is 
to sample the columns in a non-increasing order, starting from column 1 until column |J£|. (Recall 
the definition /£ = {j: (i,j) € /l for some i}.) Then, at that moment, all entries corresponding to 
a column in II have already been assigned. Therefore, sampling the remaining entries of the table 
does not dependent on Wl = and we carry out the sampling trivially using the standard procedure 
for the configuration model. 

We now describe how to sample the entries from a column j G /£ given that all columns from 1 
to j — 1 have already been sampled. Note that there are Cj — //• type-2 tokens still unmatched for 
column j. The main property we use is that Xj C Xji for all j' G \j,n], that is, the type-1 tokens 
that can be matched to the type-2 tokens from column j can also be matched to any other column 
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j' that have not yet been sampled. Therefore, it follows that each possible way to match the tokens 
from column j is equally likely; we can take a uniformly random permutation of the type-1 tokens 
in Xj that have not been matched to any column f < j and select the first Cj — //• to be matched to 
the type-2 tokens from column j. 

Recall that k and k' are the cardinalities of B and B', respectively. Now we proceed to the proof of 
(f22j) . First, assume k = 1, i.e., B = {e\} = {((3i,/3[)}, where fi\ is a type-1 token from some row % 
and /3[ is a type-2 token from some column j. We denote by Xj the set of entries in II corresponding 
to column j' < j — 1. Formally, Xj = {(i',f) £ II '■ j' £ [l,j — 1]}. We need to consider two cases. 

Case 1: (3[ corresponds to a column j £ J£. 

We write the probability that /3i is matched to j3[ as qiq2, where q\ is the probability that /3\ has 
not been matched to any column j' < j — 1 and q<i is the probability that f}\ is matched to f3[ 
given that it was not matched to any column f < j — 1. We start with g2- Note that there are 
(j = N — tj — Yjjil=i c j' ~ \k' ~ Y^ 3 j~=i p'j' ) type-1 tokens available to be matched to /3[. Note that 
tj < ^i|-^l| = o(JV) and ^■'Li c i' — Yjj~=i Pf = 0(jy/~N). Therefore, since j = O(l), we have 



1 1 



Q2 = -r 



Cj N - k> - o{N) ' 
For qi , note first that the probability that /3i is not matched to any type-2 token from a column j' is 

{J c p , — = J —^ — - = i - o(i/Vn), 

since k' = 0(1). Recall that i is the row associated with the token (3\. Clearly, when assigning a 
column j' for which (i,f) £ II, w e have that j3\ will not be matched to any token from column j' 
since we are conditioning on W^ = 0. Therefore, we obtain for q\ 

qi= J] (l-O(l/v^V))=l-O(l/VA0, 

since j = 0(1). We then obtain that /3i is matched to (3[ with probability N _ k , (l + o(l)). 

Case 2: f3[ corresponds to a column j £" I£. 

Again, with probability 1 — 0(l/yN), f3\ was not matched to any type-2 token from columns in I£. 
When this happens, there are still iV — X^'e/ C c j' ~ [&' ~ ^2j'ei c Pj' ) = N — k' — 0(yN) type-1 
tokens to be matched to /3[ and the probability that /3i and /?£ are matched is N _ k , (1 + 0(1 /yN)). 

Therefore, for k = 1 and A;' = 0(1), we obtain P (M(B) \ M{B'),W L = 0) = 3^7(1 + o(l)). When 
fc > 2, let S = {ei,e2, . . . , e^} and Bi be the first t edges in i?, i.e., B^ = {e±,e2, ■ ■ ■ , e^}. For 
convenience, define Bq = 0. Therefore 

fe 
P(M(B) I ^(50,^ = 0) = nP(A^({ea)|^( J B £ _ 1 ),^(i? , ),VF L = 0) 

which concludes the proof of Lemma 16.31 since k, k' = 0(1). □ 
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To simplify the calculations to follow, we first solve the simpler case when A = o(l). For this we 
prove Lemma 16.41 below, which gives a stronger result. 

Lemma 6.4. If lim supjy^^ A < 1, we have P (Z s = | Wl = 0) = Q(l). 

Proof. We will show that lim inf tv-s-oo P (Zs = | Wl = 0) > 0. Using Markov's inequality, we can 
write 

liminf P (Z s = | W L = 0) = 1 - limsupP (Z s > 1 | W L = 0) > 1 - limsupE [Z s \ W L = 0] . 

N->oo N^oo N^oo 

Using linearity of expectation, we obtain E [Zs | Wl = 0] = J2(i,j)eis ^>B£B 2 (i,j) P (• / ^(- B ) I ^L = 0). 
From Lemma RT3l we have P (M(B) | Wl = 0) = (1 + o(l))/N-, which gives 

2 2 

liminf P (Z s = | W L = 0) > 1-limsup V -4(1 + o(l)) > 1 - lim sup A. 

D 

From now on, we will assume that A = 0(1). The case where A is neither o(l) nor ^2(1) can be handled 
by Lemma 14.51 by setting f^(r,c) = A and Z as the space of sequences satisfying the conditions in 
Proposition 14.21 

Now we use Lemma [6.3l to show that the bounds in (|20p and (|2ip are tight up to smaller-order terms. 
This simplification is the main reason for treating 7l an d Is separately. 

Lemma 6.5. Conditional on Wl = 0, the probability that the configuration model creates three edges 
for any entry in Is can be upper bounded by P (V3 \ Wl = 0) < Ae/3 + o(l). 

Proof. For any (i,j) £ Is, the number of ways to match three edges from (i, j) is r^cj/3\, and 

P(V 3 \W L = 0)< Yl Y, P(M{B)\Wl = 0). 

(i,j)el s BeB 3 {i,j) 
We use (rj — l)(cy — 1) < eN and Lemma 16.31 to conclude that for all (i,j) £ Is 

(«J)e/s (*,j)6-fs 

which together with (I15D yields the validity of Lemma 16.51 D 

It remains to derive a bound for the term P (V2 H V% \ Wl = 0). We apply the inclusion-exclusion 
principle. Define the set T>i to contain all possible sets of the form {d±, cfo, • • • , di} with d\, (i 2 , ■ ■ ■ , dg 
being distinct double edges (i.e., distinct elements from £> 2 ). Ideally, we would like (^1,^2, • • • ,de) & 
T>e to represent a possible choice of £ double edges for £ distinct entries of the table. However, it may 
be the case that, say, d\ and d 2 are double edges for the same entry. In this case, if d\ and d 2 have 
one edge in common, then they correspond to having 3 edges matched for that entry. Otherwise, 
if d\ and <i 2 do not have an edge in common, then they correspond to having 4 edges matched for 
that entry. In any of these cases, we count these terms in the event V3, which we treat separately 
using Lemma 16.51 That is the reason why we derive the probability for V2 H V\ instead of working 
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directly with Vi ■ This is also the main benefit we obtain from considering the entries in 7l and Is 
separately. The elements of T>£ that count for the event V2 H V$ are only those corresponding to £ 
double edges from £ distinct entries. There is also one additional case. There exist terms of the form 
D = {d\, d,2, ■ ■ ■ , de} £ T>£ such that, say, d\ and cfo share a token. This happens if for example d\ 
is a double edge for the entry (i,j), c?2 is a double edge for the entry (i,j'), and one of the type-1 
tokens from row i contained in d\ is also contained in di- However, should that be the case, then 
d\ and di cannot occur simultaneously and the event that all double edges in D are matched by the 
configuration model has probability 0. 

Abusing notation slightly, for an element D £ T>£, we denote by A4(D) the event that all £ dou- 
ble edges in D are matched by the configuration model. Therefore, using the inclusion-exclusion 
principle, we obtain 

P (v 2 n v\ I W L = 0) = Y,(-i) l+x Pt, (23) 

where pi = J^Dev ^ (M-(D) n V$ \ W\, = 0). We will take a value L > sufficiently large that we 
will set later and consider the value of pi for £ < L. The following lemma gives lower and upper 
bounds for p#. 

Lemma 6.6. Assume A = 0(1) and fix L. Let £ > be an arbitrarily small constant. We can set 
e = e(A, L, £) sufficiently small in the definition of Jl so that for all £ < L we have 

|(l-e-o(l))<^<^ + o(l) (24) 

Proof. Let Jg_ be the set of all sets of £ distinct elements from Is, that is, J £ Jn has the form 
J = {(*i, J2)) (12, 32), ■ ■ ■ , (ii,jl)}, where each (ik,jk) corresponds to a distinct entry from Is- Now 
let B2(J), for J £ Ji, be the set of all possible ways we can choose one double edge from each 
element of J. That is, if J = {(h,J2), (^2,^2), • • • , (ii,ji)}, then a typical element from B £ 62 (J) 
has the form B = {d\, d 2 , ■ ■ ■ , dg}, where d^ is a double edge corresponding to the entry (ik,jk), fo r 
1 < k < £. Recall that in the summation in ([21]). we obtain P (M(D) n 7>l \ W h = 0) = for all 
D £ T>£ that contain two or more distinct double edges that share a token. Therefore, we obtain the 
following equality 

Pe = yp(M(D)nv c 3 \w L = o)=y^ V p(M(B)nv%\w L = o), (25) 



= y P(M(D)nvi\ W L = 

Dev e 


= 0) = 


JeJ e BeB 2 (J) 


P(M(B)nP 3 c 


I WL 


t term translates to 












p(M(B)nri | w L = o) = p 


(n 


\M(B 


),Wl = 


= 0)P(7W( J B) | 


Wl = 



0) . (26) 

We start with the term P (P§ | _M(J3), W L = 0) = 1-P (P 3 |M(B),Ri = 0). For each (i, j) £ J, 5 
contains at least one double edge for the entry (i,j). Therefore, the probability that V3 happens can 
be upper bounded by the probability that another edge corresponding to some entry in J is matched 
plus the probability that 3 edges for some entry not in J are matched, that is, P (V3 \ M(B), Wl = 0) 
is at most 

J2 Yl P{M(B')\M(B),W L = 0)+ Yl E V(M{B')\M{B),Wl = 0), 

{i,j)&JB'eB 1 {i,j)\\JB (i,j)gj B'eB 3 (i,j) 

where the notation \JB represents the union of all elements in B = {di, di, • • • , d{\ (recall that the di's 
are double edges). Note that LIB contains 2£ edges. Therefore, for £ = 0(1) we can use Lemma 
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for the first term and a derivation similar to the proof of Lemma 16.51 for the second term to obtain 
P(V 3 \M(B),W L = 0) 

* E "'■~v 2K l; 2) ( 1+ °( 1 >>+ E ^qi( 1+ °w> < 27 > 

(*j)eJ (i,i)0J v ; 

< e(L + A/3) + o(l), (28) 

uniformly in J and -B. 

Now we turn to the term ^2 Je j ^2b€B 2 (J) ^ O^C^) I ^ L = ^)' wri ich corresponds to an upper bound 
for the right hand side of ()25p . Our goal is to write this term recursively in £. First, notice that the 
case £ = 1 reduces to 

2 2 

£ £ PCM(JJ)|Wl = 0)= £ ^2(1 + (1)) = A + o(l), 
JeJiBet3 2 (J) (i,j)€l s 



where A = £^ j) e / s r T c j/(^^ ) as defined in Lemma 1531 

Now, consider a fixed J £ ^. Note that we can write J = J'\j(i,j) where J' £ Ji—\ and (i,j) € Is\J' ■ 
Note also that there are £ possible such values for J' C J. For a fixed J', let 77^ = rji(J') be the 
number of elements in J' corresponding to an entry in row i, that is, rji = \{(k,k') £ </' : fe = i}|. 
Likewise, let 77' = Tj'AJ') be the number of elements in J' corresponding to an entry in column j. 
Therefore, given B' £ #2 (J 7 ), the sum of row i becomes r^ — 2^ and the sum of column j becomes 
Cj — 2r/. For £ = 0(1), we can apply Lemma HT3l to derive the following equality 

Y^ Yl Pr[X(5)| W L = 0] 
JeJeBeBiiJ) 

= \ E E p (*<(*') 1 wl = 0) E E p(^(s'oi^(so,w L = o) 

J'e^_i B'eB 2 (J') (iJ)ei s \J' b"gb 2 (m) 

= 7 e e p(^')iw L =o)(i + o(i)) e |r '-^-y < 29 > 

J'€Je-iB'£B 2 (J') (i,J)eIsV 

Note that only pairs (i, j) with r^, C/ > 2 count for the last sum in (|29p . So in what follow we assume 
that ri,Cj > 2. Note that ^ r?i = £\- 77'- = ■£ — 1, and letting 

£ fV, - 9.n,W,-, _ 9r, /N |2 

(30) 

■ " Z,\l\ — LK, -t" ^J— " " ZliV — Z* -f Zl- 

(i,j)ei s \J' y ' (i,j)ei 3 \J' v ; 

we have 

2(N-2£ + 2)i ~ ^ 2(N-2£ + 2)2 ^ 2(N-2£ + 2)^ 

If we apply the condition (rj — 1)(cj — 1) < eN and use the inequality y < 2(y — 1) valid for all y > 2, 
we can write the second term in the right hand side above as 

E 2(jV tf +2) 2 < E 2{n " "ffi " 1)2 (1 + °(ViV)) < ^ - 1) + 0(1/N). 
(iJ)eJ' v ; (i,j)eJ> 



2 2 

Y - V riCJ 

*-" 2(N-2£ + 2)2 


(r 4 - 2??J )% - 2^ 
^ 2(iV-2£ + 2)2 ' 


( r . _ 27?i ) 2 -( Ci _ 2^)2 ^ ^ 


22 22 
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If we expand X in (|30|) we obtain X = X\ — X2 , where 

2rfj(2 Cj - l)rf + 2r ]l {2r i - l)cj + 8^(2^ - 1) + 8r^(2 Cj - 1) 

1 ^ 2(JV - 2£ + 2)1 

(i,i)eW 

and 

4^. 2 rf + 4t/?cj + 4r^ (2r* - l)(2 Cj - 1) + lGrftrf? 

X2 ~ ^ 2(JV - 2£ + 2)1 " 

Since r^, Cj > 2, we have that (r^ — l)(2c, — 1) and (2r, — 1)(cj — 1) can both be upper bounded by 
3(rj — l)(cj — 1). Then, applying (r^ — 1)(cj — 1) < eiV to the first two terms of X\ and using the 

fact that J2]=i Virf < P Ei=i ^ < ^ 3 and similarly YT=\ ViVj < P £™ 1 W < ^ for the last two 
terms of X\, we have X\ < 6e£ + 0(1/ N), and using the simple fact X2 > we get 

V , V ^> X-2e 2 £-6e£-0(l/N). (31) 

^ 2 A^-2£ + 2)2 - v/ ' v ; 

Putting (j29j) and (J3H together, and iterating this procedure £ times for £ = 0(1) we obtain 

{X ~ 2e2L e ~ 6eLY -o(l)<Y, E PW)I^l = 0)<| +O (1). (32) 

JeJe B<=B 2 (J) 

Since A = £1(1) and e is sufficiently small, we can find a constant c such that 

(X-2e 2 L-6eLY \ l , r/n A £ , r2 , 

zi - ii (1 " ceL ^ - 7T (1 - ceL } - 



Putting (|26|) . (|28|) . and ([32]) together, and plugging the result into ([25]) . we obtain 

| ( 1 _ ceL 2 _ £(L + A/3) _ o(1)) < Pe <^+ 0(1). 

This concludes the proof of the lemma since we can set e sufficiently small so that £ < e(cL 2 + L + 
A/3). □ 

For some fixed constant L, we can use Bonferroni inequality to obtain a lower bound for 
1 - P (V 2 n VI I W L = 0) = 1 - £^>i(-l)' +1 ^ via 



*: _ v A e 

=1 £=1 £=1 



1 - £(-i) m « > 1 - E(- 1 ) m 7r - E ¥ - °a) ^ e " A - xL ' n - ^ - oil), 



where £ is obtained from Lemma 16. 6 



Recall that (I15D implies A = 0(1). For an arbitrarily small constant 5 > independent of TV (as 
long as N is sufficiently large) we can set L large enough so that \ L / L\ < 5/4. We can also set £ 
in Lemma 16.61 so that e £ < 5/4. Now, having fixed L and £, we can set e small enough so that 



19 



Lemma [6.6l can be applied and in addition we have Ae/3 < 5/4. Then we put together (|19p and (|2ip . 
and use Lemma 16.51 and (|23p to obtain for large enough N 

L 

P (Z S = | W L = 0) > 1 - ]T(-l/+ V - Ae/3 - o(l) 

£=1 

> e~ A -<5, 

which concludes the proof of Lemma f6.2i Then, Proposition U?2] follows immediately from Lemmas l6.1 
and El 



7 Proof of Proposition 14.31 

The derivations in this section require a careful analysis of subsequences. For this reason, in this 
section, we will use the full notation for input sequences as described in the paragraph of (pQ). Let 
S = (r(£),c(£),N(£))i be an input sequence. Recall the definition 

k(S) = min{i > 1: n{£) = o{N{£)) as £ ->■ oo} 

from ([2]) and note that 

k(S) > k(S') for any subsequence S' = {r'(£'), c'{£'),N'(£')) £/ of S. (33) 

Now let 



k'(S) = min < j > 1: limsvp Cj(£) < 1 >. 



We define k(5) and k'(S') since it suffices to look at the entries in [1, k(S) — 1] X [1, k'(S') — 1]. 
Throughout this section, we assume that, as £ — > oo, we have 

V rf(£)£(£) = 0(N 2 (£)) and lim inf c x {£) > 2 and n(£) = n{N{£)). (34) 

(ij)e/ 

These conditions immediately imply that 

k(S),k'(S)>2 and ci(e),K?(S) = O(l) as € -> oo. (35) 

First we show in Lemma 17. II that if condition Q] of Theorem II .11 is true, then condition [2] is necessary. 

Lemma 7.1. For any sequence S = (r(£), c(£), N{£))n satisfying the conditions in (pMj) . if there exists 
a subsequence S' = (r 1 (£') , c' (£') , N' \£')) e of S for which J2Zk(S>) W) = o(N'(£')) as £' ^ oo and 
lirnsup^^^ c' x (^) > n(S'), then liminf^_ >tX) P (T 6 fi riC ) = 0. 

Proof. Since ci(^) = 0(1) as ^ — )• oo and limsup^^^ c^) > k(S'), we have «(<S") < oo. Also, there 
exists a subsequence 5" = (r"(£"), c"(£"), N"{£"))^i of S' having the following two properties: 

(i) ET= K{ s») <(£") = o(N"(£")) as £" -> oo. 
(ii) lim^^oo d{{£") > k(S') > k(S"). 
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Property (ii) follows immediately by the definition of lim sup and the monotonicity of k from fj33[) . 
Now note that, to establish property (i), we have that, as I" — > oo, 

oo K(S ) — 1 OO 

E rW)= E rf(0+ E *?(0 

i= K (S") i=K(3") i=K{S') 

< r': m (£")(n(S') - k(S")) + (iV"(O) = 0(^(0), 

where the inequality uses the fact that r'({£") is non-increasing with i for any fixed £" and that 
E£«(5') *i(0 = o{N'{£')) as f -> oo. Then, the last step follows since k(S') < oo and r'{{£") = 
o(N"(£")) as £" — > oo for any i > k(S") by the definition of k. We will now show that 

lim P(Te n r „ c ») = 0, 

which clearly implies that liminf7v_>oo P (T G f^ r ,c) = 0. Using the fact that ri(€) = il(A r (^)), which 
gives that r'{(£") = n(N" (£")), we have 

0(D = r , (f/)(r ^)-l) E ^(OM(O a > " E V')(4(0 - D- (36) 

Then, we have the following result for the probability that all entries (i,j) with i > k(S") and 
j < k'(S") - 1 are 0: 

/ oo «'(5")-l \ k'(S")-1 rv «(S")-l _// r/ «\ v^ 1 r" (P"\\ d J^-l 

p n n {^=o} = n (E ^ i( °:fi ,=i ^ ( ^~ 

V=L(5") '=1 / J=i (iV'^o-Efix^^))^ 2 

= 1 - o(l) as £" -> oo, (37) 

since c£(£") = O(l) by ©, X^r = \ 4(0 = 0(1) for j < k'{S") - 1 by ® and ESk(S») ^(O = 
o(N"(£")) by property (i) of 5". Conditioned on the event studied in ([37|) . the entry T^i is not 
zero only if r'l(£") is not o(N"(£")) (i.e., i < k(S") — 1). Thus, since there are k(S") — 1 such rows 
and \iva.^i_ >OQ c'{{£) > k(S"), with probability 1 — o(l), there will be a non-binary entry (i, 1) with 
i < k(S") — 1 for infinitely many values of £". This gives that 

/ oo «'(S")-1 \ 

toP(T€^)<i-MP n n {2y=o> =o, 

\i=/c(S") j=l / 

which concludes the proof of the lemma. D 

It remains to show that condition [2] is sufficient. We will first give three auxiliary lemmas and then 
will apply the lemmas to complete the proof of Proposition 14.31 In the first lemma, we consider the 
case ESk(S) r i{£) = Q(N(£)) and show that this implies P (T G f^ r ,c) = 0(1)- For this case, we have 
k(S) < oo. 

Lemma 7.2. For any sequence S = (r(£),c(£),N(£))e, if the conditions in ([34"j) are satisfied and 
EZk(s) r i{£) = n(N(£)) as £ -> oo, tfien P (T G fi PjC ) = 0(1). 
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Proof. We first show that, with probability bounded away from 0, Tij = for each i G [1, k{S) — 1] 

and j G [1, k'(S) - 1]. Let X{i) = YhS' 1 r iW- Note that > b Y assumption, JV(£) - X{£) = Q(N(£)). 
Therefore, there exists a constant a G (0, 1) such that, for all £ sufficiently large, 

P n n {T M = 0} = — ^ l J ,7 J=1 '„ > a, (38) 

since by (|3"5"j) we have Cj = 0(1) and Yjj~=i c j' — c i K '{S) = O(l) for j < k'(S) — 1. Now, for all 
j > ^(-SO an d sufficiently large £, we have Cj(£) < 1, and all the entries (i,j) for which i < k(S) — 1 
and j > k'(S) are binary with probability one. We can then conclude that the probability that all 
the entries for rows i < k(S) — 1 are binary is at least a. Once we have sampled all the rows for 
which i < k(S) — 1, we can then remove these rows and obtain new vectors r'(£) and c'(£) for each £ 
such that maxiri(£) = o(N(£)) (by the definition of n(S)) and Ei r iW = E 4 ~ K (s) r *W = 0(iV(£)). 
Note that the sequence (r f (£) , c' (£) , N {£))(. fall into the setting of Proposition FOl Therefore, letting 
T' be a table generated from the row and column sums v'{£) and c'(£), we obtain 

P (T G Q r ,c) >«P(T'e n r /, c /) = 0(1), (39) 

from ()38[) and Proposition 14.21 D 

Now, in Lemma 17.31 we consider the case where E£k(s) r «(^) = °(N(£)) for which ri{£) is ei- 
ther Q(iV(^)) or o(N(£)) for all i > 1 as £ — )■ oo. In this situation, we have that the condition 
limsup^oQ c\{£) < k(S) is sufficient to obtain P (T G O rjC ) = 0(1). The next lemma establishes this 
result when k(S) < oo. 

Lemma 7.3. Consider a sequence S = (r(£), c(£), N{£))^ satisfying (f34"j) and for which k(S) < oo 
and n(£) is either Q(N(£)) or o(N(£)) as £ ^ oo for all i > 1. // E£ K (s) r iW = o(N(£)) and 
limsup^^ ci{£) < k{S) - 1, then P (T G O r , c ) = 0(1). 

Proof. When E£L(S) r i(ty = °(-^(^))> the same derivation that led to ([37]) gives that 
P (n£ K (5)n^=i {^i,i = 0} ) = 1 — °(1)- Therefore, when assigning a column j < k'(5) — 1, 
only the entries for rows i < k(S) — 1 will be non-zero with probability 1 — o(l). Since 

Ei=? _1 c jW < cx(£)k'(S) = 0(1) by ([35D and n{£) = Q(N{£)) for all i < k(S) - 1, we have 
that there is a constant a such that, for all large enough £, the probability that a given type-2 token 
for a column j < k'(S) — 1 is matched to a type-1 token for a given row i < n(S) — 1 is at least 
a uniformly over all possible matchings for the other type-2 tokens from columns in [1,k'(S) — 1]. 
Therefore, 



^(S)-1k'(S)-1 

p ( n n i^ 1 ) 

i=l j=l 



k'(S)-1 \ k'(S)-1 



n n {^-=0} > n ( K % y^^m. m 



i=K(S) 3=1 I j=l 



We then obtain that, with probability bounded away from zero, all the entries for columns in 
[1,k'(S) — 1] are binary. For sufficiently large £, the remaining entries are all binary with prob- 
ability one since, for j > k'(S), we have lim sup^.^ Cj (£) < 1. □ 

Now we turn to the case where n(S) = oo. 

Lemma 7.4. Consider a sequence S = (r(£),c(£),N(£))i satisfying (|34|) and for which k(S) = oo 
and n{£) = U(N(£)) for all i < limsup^^ a(£). Then P (T G O r , c ) = 0(1). 
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Proof. We have £*1 ( ? )_1 c,CO < ci(£)k'(S) = 0(1) by fl35). Let h = lim sup^^ a {£) . Therefore, 

we have Cj(€) < /i for all £ sufficiently large and all j. Also, since r\{£) = Q(N(£)), there exists a 

constant e > so that X^=i r *(^) — r i(^) — tN(£) for all sufficiently large ^. Then, we can use a 

similar derivation as in ()38[) and show that the probability that all type-2 tokens for columns j < k'(S) 

k '(S)-i /e^W-E-'ri, c;{e)\ Cj ^ 
is matched to type-1 tokens for rows i < h is at least rL'=i ( nTE) ) = ®(v- Then, 



since ?"j(£) = Q(iV(£)) for all i < h and /i > c\(£) for all large enough £, a derivation similar to (|40p 
shows that all the entries for columns j < k'(S) are binary with probability bounded away from zero. 
Since for the remaining columns j > k'(S) we have Cj{£) < 1 for all large enough £, all these entries 
are binary with probability one and we obtain P (T £ £l rc ) = 0(1). D 

Now we use the three lemmas above to show that condition [2] is sufficient. Let S' = 
{r'{£'),c'{£'),N' {£'))(., be an arbitrary subsequence of S = (r{£),c(£),N(£)) e . Then, by condition EJ 
it holds that either 



') 

N'(t 

MS') 
We want to show that 



lim sup \ wtm > or limsupci(/) < n(S'). (41) 



P(Te JV, C ») = 0(1) for some subsequence S" = {r"(£"),c"(£"), N"(£")) e , of S' . (42) 

Once this is established, since S' is an arbitrary subsequence of S, we have that (|42|) holds for every 
subsequence of S. Then, using the subsubsequence principle of Lemma l4~4l we obtain that 

P(TeO r , c )=0(l), 

which concludes the proof of Proposition 14.31 

Our goal now is to prove (J4~2jh First, assume that k(S') < oo and note that if 

^oo 1=K(SI) N(£) 

then there exists a subsequence S" = (r"(£"),c"(£"),N"(£")) e , of S' such that YT=K{S') r 'l^") = 
Q(N"(£" )). Since k(S") > k(S") as observed in ([33]), we have that E£ re (S") ?'{{£") = n(N"(£")), 
and (|42p follows from Lemma 17.21 Therefore, we now assume that 

oo 

J2 r' i (£') = o(N'(£% 

i=K{S') 

which by the condition in ([4T]) gives that limsup^^oo c'^l') < k(S'). By the definition of k(S'), we 
have that limsup^^^ jfrm\ > for all i < k(S'). Since the r'^') are non- increasing with i for any 
fixed £, there exists a subsequence 5" = (r"(£"),c"{£"),N"(£")) e > of S' such that 

r'{{£") = n(N"(£")) for all i < k(S'). 

Also, for this subsequence, we have k(S") = k(S') and, consequently, Ya^k(S") r i(^") = o(N" (£")). 
Moreover, we have that limsup£//_ K30 c' 1 '(-£") < k(S") by (l33j) and the fact that lim sup^.^ d x {£') < 
k(S'). Then, from Lemma 17.31 we obtain (|42l) . 



23 



When k(S') = oo, we take s = limsup£/_ ) . C)0 c / 1 (.£ / ) and, since d]_(£') = O(l), we can find a subsequence 
5 = {r{£),c(£),N(£)) i of S' such that h{£) = fi(7V(l)) as I -> oo for all i < s. If «(S) < oo, 
the argument above holds and we can find a subsequence S" = (r"(£"),c"{£"), N"(£"))i» of S (and, 
consequently, also of S') for which (|42p holds. Otherwise, we are in the setting of Lemma 17.41 which 
establishes (l42j) for S" = S and concludes the proof of Proposition 14.31 



8 Conclusions 

We have characterized the input sequences for which the configuration model is suitable for uniformly 
sampling and counting binary contingency tables in optimal time (i.e., linear as a function of the 
number iV of unit entries in the table). Surprisingly, given known results for the case of symmetric 
tables, having a bounded number of expected double edges in the table is just a necessary condition 
for optimality but not sufficient. It turns out that the full characterization for optimality in the 
non-symmetric case relates to the behavior of very big rows (i.e. rows of size 0, (N)). Allowing such 
type of growth introduces significant qualitative differences between symmetric and non-symmetric 
tables. In turn, such differences give rise to technical challenges that are not present in the symmetric 
case. Our results also have important practical implications in applied settings that demand the 
development of easy-to-implement and fast algorithms for uniform generation of binary contingency 
tables. 

We conclude by mentioning two open problems. Since, as we mentioned in Section [TJ there is no need 
to employ a complicated sequential importance sampling procedure when the conditions above hold, 
it is interesting to know whether one can construct more specialized importance sampling procedures 
to obtain low complexity polynomial algorithm when the conditions in Theorem II .11 are not satisfied. 
Another open problem consists in counting the number of (not necessarily binary) contingency tables. 
In particular, it would be interesting to know whether a necessary and sufficient condition like the 
one obtained in this paper can be derived for that case. We remark that for general contingency 
tables, the configuration model does not generate a table uniformly at random, which makes the 
problem more challenging. 
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